Improving speech intelligibility in background noise by SII-dependent amplification and compression
نویسندگان
چکیده
In many speech communication applications it is of great interest to achieve a high intelligibility to ensure good communication. However, in these applications speech is often disturbed by additive noise and/or reverberation. Therefore, it is desirable to develop algorithms that are able to maintain a high intelligibility in such disturbed scenarios. While amplifying the speech to achieve good signal-to-noise ratios (SNR) is an easy approach, it is often not applicable due to technical limitations of the amplification system or unpleasantly high sound levels. Consequently, algorithms that increase speech intelligibility while maintaining equal powers are preferable. Several algorithms have been proposed in the past that use either frequency-dependent amplification, dynamic range compression, transient amplification, or modulations filtering techniques. The first attempt to investigate the effect of different signal processing strategies on speech intelligibility was made by Licklider and Pollack [1]. While they did not consider any additive noise or reverberation they could demonstrate that speech intelligibility in quiet is not necessarily affected by strategies such as high-pass or lowpass filtering and clipping. Niederjohn and Grotelueschen [2] proposed a preprocessing algorithm that uses high-pass filtering followed by static rapid amplitude compression. They observed an increase in speech intelligibility for preprocessed speech in white noise over the unprocessed speech at the same SNR. Zorila et al. [4] adopted the idea of dynamic range compression as a mean to increase the speech intelligibility. They used a static input-output characteristic for their dynamic range compression and used several frequency-dependent amplification steps prior to compression. Recently, Sauert and Vary proposed an algorithm that uses timeand frequency-dependent amplification of the speech signal aiming to maximize the SII [3]. However, this approach suffers from spectral adaption to the background noise. Therefore in a recent approach they considered an SNR-dependent transition between SIIweighted and unity-weighting of the speech signal [6]. In this contribution we describe an algorithm and its
منابع مشابه
Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression
In this contribution, a new preprocessing algorithm to improve speech intelligibility in noise is proposed, which maintains the signal power before and after processing. The proposed AdaptDRC algorithm consists of two timeand frequency-dependent stages, which are both functions of the estimated SII. The first stage applies a timeand frequency-dependent amplification, while the second stage appl...
متن کاملSpeech-in-noise enhancement using amplification and dynamic range compression controlled by the speech intelligibility index.
In many speech communication applications, such as public address systems, speech is degraded by additive noise, leading to reduced speech intelligibility. In this paper a pre-processing algorithm is proposed that is capable of increasing speech intelligibility under an equal-power constraint. The proposed AdaptDRC algorithm comprises two time- and frequency-dependent stages, i.e., an amplifica...
متن کاملCan modified casual speech reach the intelligibility of clear speech?
Clear speech is a speaking style adopted by speakers in an attempt to maximize the clarity of their speech and is proven to be more intelligible than casual speech. This work focuses on modifying casual speech to sound as intelligible as clear speech. First, we examine the role of speaking rate for intelligibility. Clear and casual speech signals are time-scale stretched, matching the average d...
متن کاملSII-based speech preprocessing for intelligibility improvement in noise
A linear time-invariant filter is designed in order to improve speech understanding when the speech is played back in a noisy environment. To accomplish this, the speech intelligibility index (SII) is maximized under the constraint that the speech energy is held constant. A nonlinear approximation is used for the SII such that a closed-form solution exists to the constrained optimization proble...
متن کاملمدل میکروسکوپی دوگوشی مبتنی بر فیلتر بانک مدولاسیون برای پیش گویی قابلیت فهم گفتار در افراد دارای شنوایی عادی
In this study, a binaural microscopic model for the prediction of speech intelligibility based on the modulation filter bank is introduced. So far, the spectral criteria such as the STI and SII or other analytical methods have been used in the binaural models to determine the binaural intelligibility. In the proposed model, unlike all models of binaural intelligibility prediction, an automatic ...
متن کامل